Maximum mutual information SPLICE transform for seen and unseen conditions
نویسندگان
چکیده
SPLICE is a front-end technique for automatic speech recognition systems. It is a non-linear feature space transformation meant to increase recognition accuracy. Our previous work has shown how to train SPLICE to perform speech feature enhancement. This paper evaluates a maximum mutual information (MMI) based discriminative training method for SPLICE. Discriminative techniques tend to excel when the training and testing data are similar, and to degrade performance significantly otherwise. This paper explores both cases in detail using the Aurora 2 corpus. The overall recognition accuracy of the MMI-SPLICE system is slightly better than the Advanced Front End standard from ETSI, and much better than previous SPLICE training algorithms. Most notably, it achieves this without explicitly resorting to the standard techniques of environment modeling, noise modeling or spectral subtraction.
منابع مشابه
Error - weighted discriminative training for HMM parameter estimation
Optimizing discriminative objectives in HMM parameter training proved to outperform Maximum Likelihood-based parameter estimation in numerous studies. This paper extends the Maximum Mutual Information objective by applying utterance specific weighting factors that are adjusted for minimum sentence error. In addition to that, the paper investigates tuning separate numerator and denominator weigh...
متن کاملAssessment of the Wavelet Transform for Noise Reduction in Simulated PET Images
Introduction: An efficient method of tomographic imaging in nuclear medicine is positron emission tomography (PET). Compared to SPECT, PET has the advantages of higher levels of sensitivity, spatial resolution and more accurate quantification. However, high noise levels in the image limit its diagnostic utility. Noise removal in nuclear medicine is traditionally based on Fourier decomposition o...
متن کاملImprovements in linear transform based speaker adaptation
This paper presents three forms of linear transform based speaker adaptation that can give better performance than standard maximum likelihood linear regression (MLLR) adaptation. For unsupervised adaptation, a lattice-based technique is introduced which is compared to MLLR using confidence scores. For supervised adaptation, estimation of the adaptation matrices using the maximum mutual informa...
متن کاملCombining Feature Space Discriminative Training with Long-Term Spectro-Temporal Features for Noise-Robust Speech Recognition
Discriminative training of feature space using maximum mutual information (fMMI) objective function has been shown to yield remarkable accuracy improvements. For noisy environments, fMMI can be regarded as an effective noise compensation algorithm and can play a significant role for noise robustness. Feature space speaker adaptation techniques such as feature space maximum likelihood linear reg...
متن کاملDiscriminative Linear Transforms for Speaker Adaptation
Linear transform adaptation techniques such as Maximum Likelihood Linear Regression (MLLR) are a popular and effective family of methods for speaker adaptation. MLLR estimates transform parameters for Gaussian means and variances using a maximum likelihood (ML) objective function. This paper discusses the use of an alternative discriminative objective function for linear transform estimation, w...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2005